A First Approach to the Automatic Detection of Zero Subjects and Impersonal Constructions in Portuguese
نویسندگان
چکیده
In this paper we present a first approximation to the automatic detection of zero subjects and impersonal constructions in Brazilian Portuguese. To the best of our knowledge, this is the first attempt of approaching such task using machine learning in Portuguese. We compiled a corpus containing more than 5,600 instances annotated with the classes to be identified: explicit subjects, zero subjects or pronouns and impersonal constructions. We applied machine learning using linguistically motivated features to classify the instances. The results are modest but promising and provide guidance for future work.
منابع مشابه
Elliphant: Improved Automatic Detection of Zero Subjects and Impersonal Constructions in Spanish
In pro-drop languages, the detection of explicit subjects, zero subjects and nonreferential impersonal constructions is crucial for anaphora and co-reference resolution. While the identification of explicit and zero subjects has attracted the attention of researchers in the past, the automatic identification of impersonal constructions in Spanish has not been addressed yet and this work is the ...
متن کاملElliphant: A Machine Learning Method for Identifying Subject Ellipsis and Impersonal Constructions in Spanish
This thesis presents Elliphant, a machine learning system for classifying Spanish subject ellipsis as either referential or non-referential. Linguistically motivated features are incorporated in a system which performs a ternary classification: verbs with explicit subjects, verbs with omitted but referential subjects (zero pronouns), and verbs with no subject (impersonal constructions). To the ...
متن کاملA Portuguese-Spanish Corpus Annotated for Subject Realization and Referentiality
This paper presents a comparable corpus of Portuguese and Spanish consisting of legal and health texts. We describe the annotation of zero subject, impersonal constructions and explicit subjects in the corpus. We annotated 12,492 examples using a scheme that distinguishes between different linguistic levels (phonology, syntax, semantics, etc.) and present a taxonomy of instances on which annota...
متن کاملتولید خودکار الگوهای نفوذ جدید با استفاده از طبقهبندهای تک کلاسی و روشهای یادگیری استقرایی
In this paper, we propose an approach for automatic generation of novel intrusion signatures. This approach can be used in the signature-based Network Intrusion Detection Systems (NIDSs) and for the automation of the process of intrusion detection in these systems. In the proposed approach, first, by using several one-class classifiers, the profile of the normal network traffic is established. ...
متن کاملIntegration of Visible Image and LIDAR Altimetric Data for Semi-Automatic Detection and Measuring the Boundari of Features
This paper presents a new method for detecting the features using LiDAR data and visible images. The proposed features detection algorithm has the lowest dependency on region and the type of sensor used for imaging, and about any input LiDAR and image data, including visible bands (red, green and blue) with high spatial resolution, identify features with acceptable accuracy. In the proposed app...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 49 شماره
صفحات -
تاریخ انتشار 2012